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This  paper  focuses  on  finite  minimax  problems  with  many  functions,  and  their  solutions  by  means  of 
exponential  smoothing.  We  conduct  run-time  complexity  and  rate  of  convergence  analysis  of  smoothing 
algorithms  and  compare  them  with  those  of  SQP  algorithms.  We  find  that  smoothing  algorithms  may  have 
only  sublinear  rate  of  convergence,  but  as  shown  by  our  complexity  results,  their  slow  rate  of  convergence 
may  be  compensated  by  small  computational  work  per  iteration.  We  present  two  smoothing  algorithms  with 
active-set  strategies  that  reduce  the  effect  of  ill-conditioning  using  novel  precision-parameter  adjustment 
schemes.  Numerical  results  indicate  that  the  proposed  algorithms  are  competitive  with  other  smoothing  and 
SQP  algorithms,  and  they  are  especially  efficient  for  large-scale  minimax  problems  with  a  significant  number 
of  functions  e-active  at  stationary  points. 

Key  Words.  Finite  minimax,  exponential  penalty  function,  smoothing  techniques,  active-set  strategy. 


1  Introduction 

There  are  many  applications  that  can  be  expressed  as  finite  minimax  problems  of  the  form 

(P)  niin^(x),  (1) 

a;SKd 

where  — >  M  is  defined  by 

ip{x)  =  maxf3(x),  (2) 

i&Q 

and  f3  :  M.d  — >  M,  j  G  Q  =  {1, q  >  1,  are  smooth  functions.  Minimax  problems  of  the 
form  (P)  may  occur  in  engineering  design  [1],  control  system  design  [2],  portfolio  optimization 
[3],  best  polynomial  approximation  [4],  or  as  subproblems  in  semi- infinite  minimax  algorithms 
[5].  In  this  paper,  we  focus  on  minimax  problems  with  many  functions,  which  may  result 
from  finely  discretized  semi-infinite  minimax  problems  or  optimal  control  problems. 
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The  non-differentiability  of  the  objective  function  in  (P)  poses  the  main  challenge  for 
solving  minimax  problems,  as  the  usual  gradient  methods  cannot  be  applied  directly.  Many 
algorithms  have  been  proposed  to  solve  (P);  see  for  example  [6-8]  and  references  therein. 
One  approach  is  sequential  quadratic  programming  (SQP),  where  (P)  is  first  transcribed 
into  the  standard  nonlinear  constrained  problem 

(P')  min  {z  |  fj(x)  —  z  <  0,  j  G  Q}  (3) 

(z,2)eKd+1 

and  then  a  SQP  algorithm  is  applied  to  solve  (P7),  advantageously  exploiting  the  special 
structure  in  the  transcribed  problem;  see  [7,  9].  Other  approaches  also  based  on  (P7)  include 
interior  point  methods  [8,  10,  11]  and  conjugate  gradient  methods  in  conjunction  with  exact 
penalties  and  smoothing  [12], 

Each  iteration  of  the  SQP  algorithm  in  [7]  solves  two  quadratic  programs  (QPs)  to 
compute  the  main  search  direction  and  a  modified  direction  to  overcome  the  Maratos  effect. 
The  SQP  algorithm  in  [7]  appears  especially  promising  for  problems  with  many  sequentially 
related  functions,  as  in  the  case  of  hnely  discretized  semi-infinite  minimax  problems,  due 
to  its  aggressive  active-set  strategy.  Recently,  a  SQP  algorithm  was  proposed  in  [9],  where 
the  modified  direction  is  obtained  by  solving  a  system  of  linear  equations.  This  reduces  the 
number  of  QPs  from  two  to  one  per  iteration,  while  still  retaining  global  convergence  as  well 
as  superlinear  rate  of  convergence.  There  is  no  active-set  strategy  in  [9]. 

In  general,  an  active-set  strategy  only  considers  functions  that  are  active  or  almost 
active  (e-active)  at  the  current  iterate,  and  thus  greatly  reduces  the  number  of  function  and 
gradient  evaluations  at  each  iteration  of  an  algorithm.  While  the  number  of  iterations  to 
solve  a  problem  to  required  precision  may  increase,  the  overall  effect  may  be  a  significant 
reduction  in  the  total  number  of  function  and  gradient  evaluations  of  the  algorithm.  The 
numerical  results  for  an  active-set  minimax  algorithm  in  [13]  give  a  75%  reduction  in  the 
number  of  gradient  evaluations,  when  compared  against  the  same  algorithm  without  the 
active-set  strategy.  Significant  reduction  in  computing  time  is  also  reported  for  active-set 
strategies  in  [7]. 

In  smoothing  algorithms,  see  for  example  [6,  12-15],  the  exponential  penalty  function 
introduced  in  [16]  is  used  to  produce  a  smooth  (twice  continuously  differentiable)  function 
that  approximates  ?/?(•).  Since  the  problem  remains  unconstrained,  one  can  use  any  standard 
unconstrained  optimization  algorithm  to  solve  the  smoothed  problem  such  as  the  Armijo 
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Gradient  or  Newton  methods  [6]  and  Quasi-Newton  method  [13]. 

A  fundamental  challenge  of  smoothing  algorithms  is  that  the  smoothed  problem  be¬ 
comes  increasingly  ill-conditioned  as  the  approximation  gets  more  accurate.  Hence,  an 
unconstrained  optimization  solver  may  experience  numerical  difficulties  and  slow  conver¬ 
gence.  Consequently,  the  use  of  smoothing  techniques  is  complicated  by  the  need  to  balance 
accuracy  of  approximation  and  problem  ill-conditioning.  An  attempt  to  address  these  short¬ 
comings  was  first  made  in  [15],  where  a  precision  parameter  for  the  smooth  approximation 
is  initially  set  to  a  pre-selected  value  and  is  then  increased  by  a  fixed  factor  (specifically  2) 
at  each  consecutive  iteration.  Effectively,  the  algorithm  is  solving  a  sequence  of  gradually 
more  accurate  approximations.  However,  the  main  problem  with  this  open-loop  scheme  is 
its  sensitivity  to  the  selection  of  the  multiplication  factor,  as  can  be  seen  from  the  numerical 
results  in  [6]. 

In  [6],  the  authors  propose  an  adaptive  precision-parameter  adjustment  scheme  with  ex¬ 
ponential  smoothing  to  ensure  that  the  precision  parameter  is  kept  small  (and  thus  control¬ 
ling  the  ill-conditioning)  when  far  from  a  stationary  solution,  and  is  increased  as  a  stationary 
solution  is  approached.  The  authors  use  the  norm  of  the  smoothed  function  gradient  as  a 
proxy  for  the  distance  to  a  stationary  solution.  When  the  gradient  norm  of  the  smoothed 
function  falls  below  a  user-specified  threshold,  the  precision  parameter  is  increased  to  a  level 
that  ensures  that  the  gradient  norm  falls  within  two  user-specified  bounds.  The  numerical  re¬ 
sults  show  that  this  adaptive  scheme  produces  a  much  better  management  of  ill-conditioning 
than  with  open-loop  schemes.  The  smoothing  algorithms  in  [15]  and  [6]  do  not  incorporate 
any  active-set  strategy. 

Using  the  same  adaptive  precision-parameter  adjustment  scheme  as  in  [6],  the  authors 
in  [13]  present  a  new  active-set  strategy  that  can  be  used  in  conjunction  with  exponential 
smoothing  for  specifically  tackling  large-scale  (large  q)  minimax  problems.  We  note  that  the 
convergence  result  in  Theorem  3.3  of  [13]  may  be  slightly  incorrect  as  it  claims  stationarity 
for  all  accumulation  points  of  a  sequence  constructed  by  their  algorithm.  However,  their 
proof  relics  on  [6],  which  guarantees  stationarity  for  only  one  accumulation  point. 

While  the  literature  describes  several  smoothing  algorithms  for  (P),  there  appears  to  be 
no  run-time  complexity  and  rate  of  convergence  analysis  of  such  algorithms.  Moreover,  we 
find  no  comprehensive  empirical  comparison  of  run  times  for  SQP  and  smoothing  algorithms. 
In  this  paper,  we  present  run-time  complexity  and  rate  of  convergence  results  for  smoothing 
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algorithms  and  compare  them  with  those  of  SQP  algorithms.  We  propose  two  new  active-set 
smoothing  algorithms  based  on  [6,  13]  and  present  computational  test  results  for  large-scale 
problem  instances. 

The  next  section  describes  the  exponential  smoothing  technique  and  its  properties.  Sec¬ 
tion  3  defines  a  smoothing  algorithm  and  discusses  run-time  complexity  and  rate  of  conver¬ 
gence.  Section  4  presents  two  new  smoothing  algorithms  and  their  proofs  of  convergence. 
Section  5  contains  numerical  test  results. 


2  Exponential  Smoothing 


In  this  section,  we  describe  the  exponential  smoothing  technique,  include  for  completeness 
some  known  results,  and  show  that  the  technique  leads  to  consistent  approximations  (see 
Section  3.3  of  [17]). 

For  ease  of  analysis  of  active-set  strategies,  we  consider  the  problem 


(-Pn)  min^n(ic),  (4) 

where 

fpa(x)  =  max/-' (a;),  (5) 

jeu 

and  H  C  Q.  When  f 1  =  Q,  (Pq)  is  identical  to  (P).  Next,  for  any  p  >  0  and  12  C  Q,  we 
consider  a  smooth  approximating  problem  to  (Pci),  called  the  smoothed  problem, 


(Ppn)  min  il)pci(x), 

x£Rd 


(6) 


where 

i)pci(x)  =  “log  (  J^exp  (pP(x))  J  (7) 

P  \jen  J 

=  rj}a(x)  +  ^log  (  ^  exp  (p(f3(x)  -  ipsi(x)))  J  (8) 

P  Vieo  J 

is  the  exponential  penalty  function,  with  log(-)  denoting  the  natural  logarithm.  This  smooth¬ 
ing  technique  was  first  introduced  in  [16]  and  later  used  in  [6,  12-15]. 

We  denote  the  set  of  active  functions  at  x  G  by  Q(x)  =  {j  G  Q\p(x)  =  ipci(x)}-  Ex¬ 
cept  as  specifically  stated  in  Appendix  A,  we  denote  components  of  a  vector  by  superscripts. 

We  also  let  N  denote  the  set  of  positive  integers  and  No  =  N  U  {0}. 
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The  parameter  p  >  0  is  the  smoothing  precision  parameter,  where  a  larger  p  implies 
higher  precision  as  formalized  by  the  following  proposition;  see  for  example  [13]. 

Proposition  2.1.  Suppose  that  12  C  Q  and  p  >  0. 


(i)  If  the  functions  P(-),  j  G  12,  are  continuous,  then  ifpn(-)  is  continuous  and  decreases 
monotonically  as  p  increases. 


(ii)  For  any  x  G  Rd, 


log|fi (a)  I  ,  ,  .  ,  .  .  log  1121 

0  < - <  'fpn{x)  -  ipa{x)  <  - . 


P 


p 


(9) 


where  \  ■  \  represents  the  cardinality  operator. 


(in)  If  the  functions  P(-),  j  G  12,  are  continuously  differentiable,  then  if Pp-)  is  continuously 
differentiable,  with  gradient 


v,ppn(x)  = 

jen 


(10) 


where 


,j/  v  A  exp (pfffx))  _  exp(j)[p (x)  -  i/)n{x)]) 

HJp\X)  v  ,  rL.  ,  vv  /  r-jU/  x  ,  /  ^  V11/ 

2^exp(p/  (x))  2^exp(p[/  (V)  -  ^n(x)]) 


and  Ejeo  hp(^')  =  !• 

If  the  functions  P(-),  j  G  12,  are  twee  continuously  differentiable,  then  i/jpn(')  is  twice 
continuously  differentiable,  with  Hessian 


,jix)T 


-p 


.X) 


Jen 


.X) 


j6f2 


(12) 


Assumption  2.1.  We  assume  that  the  functions  P(-),j  G  Q,  are  twice  continuously  differ¬ 
entiable.  □ 

The  next  lemma  can  be  deduced  from  Lemma  2.2  of  [6]. 
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Lemma  2.1.  Suppose  that  Assumption  2.1  holds.  Then,  for  every  bounded  set  S  C 
there  exists  an  L  <  oo  such  that 


(v,  V2V’pS!0)i/}  <  pLWvf, 


(13) 


for  all  x  G  S,y  G  12  C  Q,  and  p  >  1. 

A  continuous,  nonpositive  optimality  function  for  (Pa)  is  given  by 


On(x)  =  -  min  l  V/P(V>n(z)  -  P(x))  +  \ 

I  z ' 


jen 


5>wn  i 

ten 


x 


where 


>  0  for  all  j  G  12, 
l  ten 


^  =  1  f  , 


(14) 


(15) 


which  results  in  the  following  optimality  condition  for  (Pa)',  see  Theorems  2.1.1,  2.1.3,  and 
2.1.6  of  [17]. 

Proposition  2.2.  Suppose  that  Assumption  2.1  holds  and  that  12  C  Q.  If  x*  G  is  a  local 
minimizer  for  (Pa),  then  9q(x*)  =0.  □ 

The  continuous,  nonpositive  optimality  function 


vw  =  -  iiiv^wr 


(16) 


characterizes  stationary  points  of  (Ppa)  as  stated  in  the  next  proposition;  see  Proposition 
1.1.6  in  [17]. 

Proposition  2.3.  Suppose  that  Assumption  2.1  holds,  p  >  0,  and  12  C  Q.  If  x*  e  is  a 
local  minimizer  for  (Ppa),  then  6pa(x*)  =  0.  □ 

We  next  show  that  the  exponential  smoothing  technique  leads  to  consistent  approxima¬ 
tions  (see  Section  3.3  in  [17]),  which  ensures  that  globally  and  locally  optimal  points  as  well 
as  stationary  points  of  (Ppa)  converge  to  corresponding  points  of  (Pa),  as  p  — >  oo.  Consis¬ 
tent  approximations  also  facilitate  the  construction  of  implementable  algorithms  for  (P);  see 
Algorithm  4.1  below. 

We  define  consistent  approximation  as  on  page  399  of  [17]. 
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Definition  2.1.  For  any  12  C  Q,  p  >  0,  we  say  that  the  pair  ((Ppci),9pci(-))  is  a  consistent 
approximation  to  ((Pci) ,  9ci(-))  if  (i)  (Ppci)  epi-converges  to  (Pci),  as  p  —>  oo,  and  (ii)  for  any 
sequences  {ay}?80  C  and  {pj}?20,pj  >  0  for  all  i,  and  x*  G  Rd  such  that  Xi  — >  x*  and 
Pi  — >•  oo,  as  i  — >  cx),  limsupj^  9Pin(xi)  <  9a(x*).  □ 

Theorem  2.1.  Suppose  that  Assumption  2.1  holds,  p  >  0,  and  12  C  Q.  Then,  the  pair 
((Ppn),9pn(-))  is  a  consistent  approximation  to  {(Pci),  9q(-))- 

Proof.  We  follow  the  proofs  of  Lemmas  4.3  and  4.4  in  [18],  but  simplify  the  arguments  as 
[18]  deals  with  min-max-min  problems.  By  Theorem  3.3.2  of  [17],  Proposition  2.1  (ii) ,  and 
the  continuity  of  'ipci(-),  it  follows  that  (Ppn)  epi-converges  to  (Pci),  as  p  — >  oo. 

We  next  consider  the  optimality  functions.  Let  {ay}?80  C  and  {pi}?20,pi  >  0  for  all 
i,  be  arbitrary  sequences  and  x*  G  be  such  that  ay  — >  x*  and  pt  — >  oo,  as  i  — >  oo.  Since 
Hp(x)  G  (0,1)  for  any  j  G  12,  p  >  0,  and  x  G  {/iPi(ay)}°2 0  is  a  bounded  sequence  in 
M|n|  with  at  least  one  convergent  subsequence.  Hence,  for  every  such  subsequence  K  C  No, 
there  exists  a  Hoc  G  E^  such  that  nPi(xi)  —>K  Hoc,  as  i  — >  oo.  Moreover,  since  Hoc  £  Eq, 
Too  ^  ■ 

If  j  £  12(a;*),  then  there  exist  a  t  >  0  and  io  G  N  such  that  f^(xi)  —  ifci(xi)  <  —t  for  all 
i  >  io-  Hence,  from  (11),  HPi(xi )  0,  as  i  — >  oo,  and  therefore  h3oo  =  0-  By  continuity  of 

vmie  12, 

f>Kn(x<)  -,f  -jliy/AV/J(V)||2  =  <Wz*),  (17) 

jeci 

as  i  — >  oo.  Since  Hoc  £  Eq  and  pic  —  0  f°r  all  j  ^  12 (x*),  we  find  in  view  of  (14)  that 

<W**)  =  ►E'‘~(fc(V)-/j(x*))-i||^f.j0W'(V)||2<#n(x*).  (18) 

j en  jen 

This  completes  the  proof.  □ 

3  Run-Time  Complexity  and  Rate  of  Convergence 

In  this  section,  we  focus  on  the  run-time  complexity  and  rate  of  convergence  of  smoothing 
algorithms.  Specifically,  we  deal  with  the  following  simple  smoothing  algorithm  for  solving 
(P)  based  on  application  of  the  Armijo  Gradient  Method1  to  (Ppq). 

Whe  Armijo  Gradient  Method  uses  the  steepest  descent  search  direction  and  the  Armijo  stepsize  rule  to  solve  an 
unconstrained  problem;  see  for  example  Algorithm  1.3.3  of  [17]. 
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Algorithm  3.1.  Smoothing  Armijo  Gradient  Algorithm 

Data:  t  >  0,io6  Md. 

Parameter:  5  G  (0,1). 

Step  1.  Set  p*  =  logq/((l  —  S)t). 

Step  2.  Generate  a  sequence  {xi}“0  by  applying  Armijo  Gradient  Method  to  (. Pp*q ).  □ 

We  denote  the  optimal  value  of  (P)  (when  it  exists)  by  if*,  the  optimal  value  of  ( Ppq ) 
(when  it  exists)  by  if*Q  for  any  p  >  0,  and  the  optimal  solution  of  ( Ppq )  (when  it  exists)  by 
x*q.  Algorithm  3.1  has  the  following  property. 

Proposition  3.1.  Suppose  that  Step  2  of  Algorithm  3.1  has  generated  a  point  xt  G  Rd  such 
that  'ifp*Q(xi)  —  f>p*Q  <  St.  Then,  if{xf)  —  if>*  <  t. 

Proof.  The  result  follows  directly  from  (9)  and  the  selection  of  p*.  □ 

Proposition  3.1  shows  that  we  can  obtain  a  near-optimal  solution  of  (P)  by  approxi¬ 
mately  solving  ( PpQ )  for  a  sufficiently  large  p.  As  discussed  in  Section  1,  Algorithm  3.1  will 
be  prone  to  ill-conditioning.  Adaptive  schemes  for  adjusting  the  precision  parameter  p  and 
the  use  of  another  method  in  Step  2  may  perform  better  in  practice.  However,  the  following 
study  of  run-time  complexity  and  rate  of  convergence  of  Algorithm  3.1  provides  fundamental 
insights  into  smoothing  algorithms  in  general. 

We  start  with  some  intermediate  results  that  utilize  the  following  convexity  assumption. 

Assumption  3.1.  Suppose  that  G  Q,  are  twice  continuously  differentiable  and  there 

exist  0  <  m  <  M  <  oo  such  that 

m\\y\\2  <  ( y ,  V2P(x)y)  <  M\\y\\ 2,  (19) 

for  all  x,  y  G  and  for  all  j  G  Q.  □ 

Lemma  3.1.  Suppose  that  Assumption  3.1  holds.  For  any  x,y  G  and  p  >  0, 


m\\y\\2  <  (y,V2'ifpQ(x)y)  . 


(20) 


Proof.  From  (12)  and  (19),  we  obtain  that 


(y,^2^PQ(x)y)  =  ^2nJp(x)  (y,V2P(x)y) +p^2^p(x)  (y,VP(x)VP(x)Ty) 


p  \  y, 


JeQ 


EdMv/J| 


x) 


j&Q 


(y^2fJ(x)y) +p^2^P(x)  (y,vp(x)y 


j£Q 


p  {  y, 


j£Q 


x) 


UeQ 


>  m\\y\\2  +  P^  f4(x)  (y>  V/J(r))2  -p(y, 
j£Q 


Y  dww’ 

UeQ 


x 


Hence,  we  only  need  to  show  that  the  difference  of  the  last  two  terms  is  nonnegative.  Let 
g  :  — »  M  be  defined  as  g(z )  =  (y,  z )2.  The  function  g  is  a  composition  of  a  convex  function 

with  a  linear  function,  so  it  is  convex;  see  for  example  Proposition  2.1.5  of  [19].  Hence,  it 
follows  from  Jensen’s  inequality  (see  for  example  page  6  of  [19])  that 


(VVM)  > 9 

jeQ  VieQ 


x) 


(21) 


Since  p  >  0,  the  result  follows. 


Proposition  3.2.  Suppose  that  Assumption  3.1  holds  and  p  >  1.  Then,  the  rate  of  conver¬ 
gence  for  the  Armijo  Gradient  Method  to  solve  ( Ppq )  is  linear  with  coefficient  1  —  k/p,  for 
some  k  G  (0, 1).  That  is,  for  any  sequence  {aq}°h0  C  generated  by  the  Armijo  Gradient 
Method  when  applied  to  (Ppq), 

'fpQ{xi+ 1)  -  ifpQ  <  (l-^j  [f’pQixi)  -  rpQ)}  for  all  i  G  N0.  (22) 

Proof.  Based  on  Lemmas  2.1  and  3.1,  for  every  bounded  set  S  C  there  exist  an  L  G 
[m,  oo)  such  that 

m\\y\\2  <  (y,  V2ifpQ(x)y)  <  pL\\y\\2,  (23) 


for  all  x  G  S  and  t/Gl11  and  p  >  1.  Hence,  we  deduce  from  Theorem  1.3.7  of  [17]  that  the 
rate  of  convergence  for  Armijo  Gradient  Method  to  solve  ( Ppq )  is 


..  4m/3a(l  —  a) 

pL 


e  (0,1), 


(24) 
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where  a, /3  G  (0,1)  are  the  Armijo  line  search  parameters.  Hence,  k  =  4m/3a(l  —  a)/L , 
which  is  less  than  unity  because  a(l  —  a)  G  (0, 1/4].  □ 

In  order  to  analyze  the  run-time  complexity  of  Algorithm  3.1,  we  need  an  assumption 
on  the  complexity  of  function  and  gradient  evaluations. 

Assumption  3.2.  We  assume  that  there  exist  constants  c,  d  <  oo  such  that  for  any  d  G  N, 
j  G  Q,  and  x  G  the  computational  work  to  evaluate  fJ(x)  and  V/J  (a;)  is  no  larger  than 

cd  and  c'd2 ,  respectively.  □ 

Assumption  3.2  holds  for  all  problem  instances  considered  in  this  paper  (see  Appendix 
A)  and  appears  reasonable  for  many  practical  situations.  The  following  result  can  easily  be 
modified  to  account  for  other  assumption  about  work  per  function  and  gradient  evaluation. 


Theorem  3.1.  Suppose  that  Assumptions  3.1  and  3.2  hold.  For  any  tolerance  t  G  (0,logg), 
there  exists  a  constant  ct  <  oo  such  that  the  computational  work  in  Algorithm  3.1  to  generate 
{a;j}f=0,  with  the  last  iterate  satisfying  if(xn)  —  if*  <  t,  is  no  larger  than  ctqd2  log  q. 


Proof.  Since  p*  =  logg/((l  —  S)t)  >  1  for  t  G  (0,logg),  Proposition  3.2  applies  and  we  find 
that  the  number  of  iterations  of  the  Armijo  Gradient  Method  to  obtain  {a;;}”=0  such  that 
i ipp*Q(xn )  —  il^p*Q  <  5t  is  no  larger  than 


l°g(l  -  £) 


(25) 


where  k  is  as  in  Proposition  3.2,  t0  =  —  if*,  and  [■]  denotes  the  ceiling  operator. 

Since  the  main  computational  work  in  each  iteration  for  the  Armijo  Gradient  Method  is 
to  determine  VV;p*q(xj),  see  (10),  it  follows  by  Assumption  3.2  that  there  exists  a  c*  <  oo 
such  that  the  computational  work  in  each  iteration  of  the  Armijo  Gradient  Method  when 
applied  to  ( Pp*q )  is  no  larger  than  c*qd 2 .  Hence,  the  computational  work  in  Algorithm  3.1 
to  generate  {xi\f=Q,  with  ifp*Q(xn)  —  iff*q  <  5t,  is  no  larger  than 


c*qd 2 


log  I 


i°g(i  -  A) 


(26) 


Since  p*  =  log  q/((l  —  S)t),  it  follows  from  Proposition  3.1  that  the  computational  work  in 
Algorithm  3.1  to  generate  {xi\f=Q,  with  if(xn)  —if*<t,  is  no  larger  than 


c*qd2 


los| 


log  1 


k(l— S)t 
log  q 


<  c*qd2 


k(l— 8)t 
log  q 


(27) 
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where  we  use  the  fact  that  j  logx|  >  \x  —  1|  for  x  G  (0, 1].  The  result  then  follows.  □ 

Focusing  on  q,  we  see  from  Theorem  3.1  and  its  proof  that  the  number  of  iterations  for 
Algorithm  3.1  to  achieve  a  near-optimal  solution  of  (P)  is  O(logg).  Moreover,  the  run-time 
complexity  of  Algorithm  3.1  to  achieve  a  near-optimal  solution  of  (P)  is  0(q\ogq). 

For  comparison,  we  next  consider  the  run-time  complexity  of  a  SQP  algorithm  to  achieve 
a  near-optimal  solution  of  (P).  The  main  computational  work  in  an  iteration  of  a  SQP 
algorithm  involve  solving  a  convex  QP  with  d+1  variables  and  q  inequality  constraints  [7] . 
Introducing  slack  variables  to  convert  into  standard  form,  this  subproblem  becomes  a  convex 
QP  with  d+l+q  variables  and  q  equality  constraints.  Based  on  [20],  the  number  of  operations 
to  solve  the  converted  QP  is  0((d  +  1  +  g)3).  Assuming  that  the  number  of  iterations  a  SQP 
algorithm  needs  to  achieve  a  near-optimal  solution  of  (P)  is  0(1),  and  again  focusing  on 
q,  the  run-time  complexity  of  a  SQP  algorithm  to  achieve  a  near-optimal  solution  of  (P)  is 
no  better  than  0(q3).  This  complexity,  when  compared  with  O(glogg)  of  Algorithm  3.1, 
indicates  that  smoothing  algorithms  may  be  more  efficient  than  SQP  algorithms  for  minimax 
problems  with  many  functions. 

Next,  we  consider  the  rate  of  convergence  for  Algorithm  3.1.  Suppose  that  Assumption 
3.1  holds  and  that  Step  2  of  Algorithm  3.1  has  generated  a  sequence  {ay}”=0.  Then,  in  view 
of  (9)  and  Proposition  3.2, 


tp{xn)  -  ip * 


<  ipp*Q(xn)  -  ip;.Q  +  ^ 


< 

< 


'ip(xo)  + 


log q 

p* 


(xo)  -  tp*]  + 


-  ^(x*pQ) 

2  log  q 


log? 

p* 


(28) 


where  k  is  as  in  Proposition  3.2.  We  examine  the  rate  at  which  ip(xn)  —  ip*  vanishes  as 
n  — >  oo.  As  is  clear  from  the  right-hand  side  of  (28),  ip{xn)  —  ip*  may  not  vanish  if  p*  is  a 
constant  as  n  — >  oo.  Hence,  p*  should  be  large  when  n  is  large.  Let  eo  =  ip(x o)  —  ip*  and,  for 
any  n  £  N  and  pn>  1,  let 


A 

en  —  Cq 


/  _  k_\'‘  +  2  log  q 
V  Pn)  Pn 


(29) 


In  view  of  (28),  the  quantity  en  is  an  upper  bound  on  ip(xn)  —  ip*  when  p*  =  pn  in  Algorithm 


3.1. 
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Before  we  present  the  rate  of  convergence  results  for  Algorithm  3.1,  we  need  the  following 
trivial  technical  result. 


Lemma  3.2.  For  x  G  [0, 1/2], 


—2x  <  log(l  —  x)  <  —x. 


(30) 


We  next  state  the  rate  of  convergence  of  Algorithm  3.1,  which  shows  that  the  rate  is  no 
better  than  sublinear  even  for  an  “optimal”  choice  of  p* . 

Theorem  3.2.  Suppose  that  Assumption  3.1  holds.  Let  (pn}^A1;  with  pn  >  1,  n  G  N,  be 
a  sequence  of  precision  parameters  and,  for  any  n  G  N,  let  {a)j}f=0  C  Rd  be  a  sequence 
generated  by  Algorithm  3.1  with  p*  =  pn.  Then, 


lirninf  >  -1. 

n^oo  log  Tl 

If  pn  =  (n/  logn  for  all  n  G  N,  with  Q  G  (0,  k],  where  k  is  as  in  Proposition  3.2,  then 

v  l°g  en  , 

inn  - - =  —1. 

n— >oo  log  n 

Proof.  For  any  n  G  N,  we  see  from  (29)  that 


(31) 


(32) 


log  en  =  log  (  exp  log  e0  +  n  log  (  1 

V  L  V  P 

>  log  ^max  |  exp 


k 


+ 


21ogg 


Pr 


=  max  |  log  (  exp 
Hence,  for  any  n  G  N,  n  >  1, 


k 

log  e0  +  n  log  I  1 - 

Pn 

k 

log  e0  +  n  log  I  1 - 

Pn 


2  logg 


Pn 


,log 


2  logg 
Pn 


logn 


>  max 


logn 


+ 


log  en  I  loge0  ,  nlog  i1  '  p„  )  log  pn  log  2  log  log  q 


logn 


+ 


+ 


log  n  log  n  log  n 


(33) 


Let  e  >  0  be  arbitrary.  Then,  there  exists  a  no  G  N  such  that  log  log  q/  log  n  >  — e  for  all 
n  >  no-  If  logpn/  logn  <  1  and  n  >  max{2,  no},  then 

log  en  .  log  pn  log  2  loglogg  logpn 

- - >  — - - h  . - 1 - . - >  — -j - e  >  — 1  —  e.  (34) 

log  n  log  n  log  n  log  n  log  n 

Alternatively,  suppose  that  log  pnj  logn  >  1.  Hence,  n/pn  <  1,  and  if  n  >  2k,  then  k/pn  G 
(0, 1/2],  Based  on  Lemma  3.2  and  (33), 


log  n  log  n 


log en  >  log e0  71106  (1  Pnj  >  log e0  n\  Pn)  >  log e0  2k 


logn 


log  n  log  n  log  n  log  n 


(35) 
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for  all  n  >  2k  such  that  \ogpn/  log  n  >  1.  Thus,  there  exists  ri\  >  max{n0,2k}  such  that 

(3 

log  n  log  n 

for  all  n  >  n i .  Hence,  for  all  n  >  n i , 

(3 


Since  e  is  arbitrary,  (31)  then  follows. 

Next,  we  will  prove  the  second  part  of  the  theorem.  Since  pn  =  (n/logn,  where  Q  E 
(0,  k],  and  from  (29), 


log  en  =  log  exp  log  e0  +  n  log  (  1  - 


k  log  n  \  ]  2  log  q  log  n 


There  exists  n2  G  N  such  that  k  log  n / Qri  e  [0, 1/2]  for  all  n  >  n2-  Thus,  by  Lemma  3.2 


log  (  exp  log  e0  +  n  (  - 


2k  log  n\  1  2  log  q  log  n 


<  log  en 

<  log  f  exp  log  e0  +  n 


k  log  n\  2  log  q  log  n 

(n  )  \  (n 


for  all  n  >  n2-  We  first  consider  the  lower  bound  in  (39), 


log  (  exp  log  e0  +  n  (  - 


2k  log  n\  1  2  log  q  log  n 


=  log 


=  log 


2  log  q  log  n  exp  (log  e0  +  log  n  2k/c-) 


2  log  q  log  n 


+  log 


2  log  q  log  n 
C,n 

'  1  _  2fc 

eoC  n  c 

2  log  q  log  n 


+  1  • 


Since  (  G  (0,  k\  and  by  continuity  of  the  log(-)  function, 


lim  log 


i  _  2k 

eoC  n  ( 


OO  \  2  log  q  log  n 


+  1=0. 


Continuing  from  (40),  and  using  (41),  we  obtain  that 


l0g/21o^ogn 


+  log 


ep  (n  C  ^ 
2  log  q  log  n 


=  lim 


1  /  2  log  q  log  n 

8  V 


+  lim 


eoCn  c  ^ 

2  log  q  log  n 


log  2  +  log  log  q  +  log  log  n  —  log  (  —  log  n 

=  lim  - - - 

n— >oo  log  Tl 

=  -1. 
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Similar  arguments  lead  to  the  result  that  the  upper  bound  in  (39)  also  tends  to  —1,  as 
n  — >  cx).  Hence,  the  conclusion  follows.  □ 

Theorem  3.2  implies  that  for  large  n,  en  is  no  smaller  than  approximately  1  jn  for  any 
choice  of  the  precision  parameter  pn.  Moreover,  with  the  “optimal”  choice  of  pn  =  (n/  log  n, 
en  ~  1  jn.  Hence,  the  rate  of  convergence  of  en  is  sublinear.  Since  en  is  an  upper  bound  on 
the  distance  to  the  optimal  value  after  n  iterations  of  Algorithm  3.1  with  p*  —  pn,  Algorithm 
3.1  has  rate  of  convergence  no  better  than  sublinear  as  the  next  result  formalizes. 


Corollary  3.1.  Suppose  that  Assumption  3.1  holds.  Let  with  pn  >  1,  n  e  N, 

be  a  sequence  of  precision  parameters  and,  for  any  n  G  N,  let  {xj}”=0  C  Rd  be  a  sequence 
generated  by  Algorithm  3.1  with  p*  =  pn  =  Qnj  log  n,  with  (  G  (0,  k\,  where  k  is  as  in 
Proposition  3.2.  Then 


lim  sup 


log(if(xn)  -  'if* 
log  n 


<  -1. 


(43) 


Proof.  From  (28)  and  (29),  ip{xn)  —  if*  <  en  for  all  n  e  N.  Thus,  for  all  n  e  N,  n  >  1, 

log(^(xn)  -  if*)  log  en 


< 


(44) 


log  n  log  n 

The  result  then  follows  from  Theorem  3.2.  □ 

SQP  algorithms  for  (P)  achieve  super  linear  rate  of  convergence;  see  for  example  [7, 
9].  We  note,  however,  that  the  computational  work  per  iteration  for  SQP  algorithms  as 
discussed  above  is  at  least  0((d  +  q )3).  On  the  other  hand,  the  computational  work  per 
iteration  of  Algorithm  3.1  is  0(qd2)  under  Assumption  3.2.  Hence,  there  may  be  classes  of 
problem  instances  on  which  smoothing  algorithms  may  perform  better  than  SQP  algorithms. 
The  next  section  gives  two  novel  smoothing  algorithms  that  aim  to  manage  the  precision 
parameter  effectively  to  avoid  ill-conditioning. 


4  Smoothing  Algorithms 

We  present  two  smoothing  algorithms  to  solve  (P).  The  first  algorithm,  Algorithm  4.1  below, 
is  based  on  Algorithm  3.2  in  [13],  but  uses  a  much  simpler  rule  for  precision  adjustment.  The 
second  algorithm,  Algorithm  4.2  below,  adopts  a  novel  line  search  rule  that  aims  to  ensure 
descent  in  if(-)  and,  if  that  is  not  possible,  increases  the  precision  parameter.  Previous 
smoothing  algorithms  [6,  13]  do  not  check  for  descent  in  if(-). 
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We  use  the  following  notation.  The  e-active  set,  e  >  0,  is  denoted  by 


Q,(x)  =  {j  e  Q\ip(x)  -  fix)  <  £}. 


(45) 


Similar  to  Algorithm  3.2  of  [13],  we  compute  a  search  direction  using  a  dx  d  matrix  Bpq(x). 
We  consider  two  options.  When 

Bpn(x)  =  /,  (46) 

the  d  x  d  identity  matrix,  the  search  direction  is  equivalent  to  the  steepest  descent  direction. 
When 

Bpn(x)  =  rjpn(x)I  +  HpCi(x ),  (47) 

the  search  direction  is  a  Quasi-Newton  direction,  where 


(48) 

VPn(x)  =  max{0,  5  -  epn(x)}f  (49) 

and  epn(x)  is  the  smallest  eigenvalue  of  Hpq(x). 

We  next  present  the  two  algorithms  and  their  proofs  of  convergence. 

Algorithm  4.1. 

Data:  Xo  E  Md. 

Parameters:  a,f3  G  (0,1), po  >  l,cu  =  101ogg/p0,  function  Bpq(-)  as  in  (46)  or  (47), 
e0  >  0,f  >  l,q  >  1. 

Step  1.  Set  i  =  0 ,j  =  0, 00  =  Qeo(x 0). 

Step  2.  Compute  the  search  direction  hp^ixi)  by  solving  the  equation 


Bp^Xx^hp^Xxi)  'W'4>Pio,i{xi) . 

Step  3.  Compute  the  stepsize  \  =  /3ki ,  where  kj  is  the  largest  integer  k  such  that 


and 


Step  4.  Set 


i’r.nAxf  +  PkKnAxi))  -  i>naSxi)  <  -a/3'!||/iKn1(ii)||2 
tp.aAxi  +  PthKn,(Xi))  -<P(xt  +  l3khK a,(xi))  >  -w. 
xi+ 1  =  Xi  +  pkihPisit(xi)> 


(50) 


(51) 


(52) 

(53) 
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U  Qei  (*N+l)- 


(54) 


Step  5.  Enter  Subroutine  4.1,  and  go  to  Step  2  when  exit  Subroutine  4.1.  □ 

Subroutine  4.1.  Adaptive  Precision-Parameter  Adjustment  using  Optimality  Function 
If 

0PinAxi+ 1)  >  -O,  (55) 

set  x*  =  xi+i,  set  pi+1  =  fpi,  set  ei+1  =  replace  i  by  i  +  1,  replace  j  by  j  +  1,  and  exit 
Subroutine  4.1. 

Else,  set  pi+\  =  p*,  set  e^+i  =  e*,  replace  i  by  i  +  1,  and  exit  Subroutine  4.1.  □ 

Steps  1  to  4  of  Algorithm  4.1  is  identical  to  Algorithm  3.2  of  [13].  The  key  difference 
between  the  two  algorithms  is  the  simplified  rule  to  adjust  p*  in  Subroutine  4.1.  This  dif¬ 
ference  calls  for  a  different  proof  of  convergence  as  compared  to  [13],  and  will  be  based  on 
consistent  approximation.  The  next  result  is  identical  to  Lemma  3.1  in  [13]. 

Lemma  4.1.  Suppose  that  {£;}°h0  C  Rd  is  a  sequence  constructed  by  Algorithm  4-1.  Then, 
there  exists  an  i*  G  N0  and  a  set  12*  C  Q  such  that  working  sets  12*  =  12*  for  all  i  >  i* . 

Proof.  By  construction,  the  cardinality  of  the  working  sets  {12i}^L0  is  monotonically  in¬ 
creasing.  Since  the  set  Q  is  finite,  the  lemma  must  be  true.  □ 

Theorem  4.1.  Suppose  that  Assumption  2.1  holds.  Then,  any  accumulation  point  x*  G 
of  a  sequence  {a:*}°L0  C  constructed  by  Algorithm  4-1  satisfies  the  first-order  optimality 
condition  Oq(x*)  =  0. 

Proof.  Let  12*  C  Q  and  i*  G  No  be  as  in  Lemma  4.1,  where  12 j  =  12*  for  all  i  >  i*.  As 
Algorithm  4.1  has  the  form  of  Master  Algorithm  Model  3.3.12  in  [17]  for  all  i  >  i* ,  we 
conclude  based  on  Theorem  3.3.13  in  [17]  that  any  accumulation  point  x*  of  a  sequence 
{x*}°L0  constructed  by  Algorithm  4.1  satisfies  9q*(x*)  =  0.  The  assumptions  required  to 
invoke  Theorem  3.3.13  in  [17]  are  (i)  continuity  of  and  9P n*(-),  p  >  0, 

which  follows  by  Assumption  2.1,  Proposition  2.1  (i) ,  Theorem  2.1.6  of  [17],  and  Proposition 

2.1  (iii) ;  (ii)  the  pair  ((Ppn*),  0Pn*(-))  must  be  a  consistent  approximation  to  ((Pq*),  9q*(-)), 
which  follows  by  Theorem  2.1;  and  (iii)  if  Steps  1  to  4  of  Algorithm  4.1  are  applied  repeat¬ 
edly  to  (Ppo*)  with  a  fixed  p  >  0,  then  every  accumulation  point  a:  of  a  sequence  {xh}™=Q 
constructed  must  be  a  stationary  point  of  (Ppn*),  i.e.,  9pn*(x)  =  0,  which  follows  by  Theorem 

3.2  in  [13], 
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Since  9q*(x*)  =  0,  from  (14),  there  exists  a  n  G  E^*  such  that 


Y  /JJ(^n*(z*)  -  P(x*))  + 
jea* 


3<x*\ 


=  0.  (56) 

Let  7T  G  E q,  7rJ  =  0  for  j  G  Q  —  O*,  and  7T7  =  /T  for  j  G  O*.  Thus,  it  follows  from  (14)  that 


E  ^v/ 

ten* 


«e(V)  >  -  E  ^(V’Ca;*)  -  /%*))  -  |  E 

jeQ  ie<? 

Since  dg(-)  is  a  nonpositive  function,  the  result  follows. 


B x * 


=  0. 


(57) 


Algorithm  4.2. 

Data:  Xo  G  Md. 

Parameters:  a,  (3  G  (0,1),  function  Bp^(-)  as  in  (46)  or  (47),  e  >  0,  d  >  l,po  >  l,p  3> 
Po,K  »  l,f  >  1,7  >  0,1/  G  (0, 1),  Ap  >  1. 

Step  0.  Set  ?'  =  0,  h20  =  Qe(x 0),  /c_i  =  0. 

Step  1.  Compute  Bp.Q.(xi)  and  its  largest  eigenvalue  ^“^(a^)-  If 

<r£(^)  >  «,  (58) 


compute  the  search  direction 


hpisuixi)  =  -V^n^Xi). 

Else,  compute  the  search  direction  h^Q^Xi)  by  solving  the  equation 

BPicii  {x^hp^  (xi)  =  —^7ipPifii(xi). 


(59) 


(60) 


Step  2a.  Compute  a  tentative  Armijo  stepsize  based  on  working  set  Q,;,  starting  from  the 
eventual  stepsize  of  the  previous  iterate  /q_i,  i.e.,  determine 


-"ViOi  (a^i) 


Set 


max  {/5/|^A(a:i+^hpini(a:i))-^Piol(a:i)  <  ^(V^o^a:*),^^))}. 

(61) 

yi  =  Xi  +  .7/',,.!;  !',] .  (62) 


Step  2b.  Forward  track  from  r/*  along  direction  hp^ixi)  as  long  as  ?/;(•)  continues  to  decrease 
using  the  following  subroutine. 

Substep  0.  Set  l'  =  /, 


=  Xi  +  (3V hp^ipCi)  and  =  a;*  +  j3v  1hPini( Xi). 


(63) 
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Substep  1.  If 


ip(zu'-i)  <  i/>(zw), 


(64) 


replace  l'  by  l'  —  1,  set  zw- 1  =  ay  +  (31'  1hPini(xi),  and  repeat  Substep  1. 
Else,  set  Zi  =  z^. 


Substep  2.  If  pi  <  p,  go  to  Step  3.  Else,  go  to  Step  4. 


Step  3.  If 


i)(zi)  -  (65) 

Vi 

set  Xi+ 1  =  Zi,pi+ 1  =  pi ,  ki  —  l',  set  flj+i  =  Qt  U  Qe(xi+ 1),  replace  i  by  i  +  1,  and  go  to  Step  1. 
Else,  replace  pi  by  £pi,  replace  O*  by  fb  U  Qe(zi),  and  go  to  Step  1. 

Step  4.  If 

ij>{zi)  ~  ^(xi)  < — (66) 

Pi 


set  xi+i  =  Zi,  ki  =  V ,  set  pi+ 1  =  Pi  +  A p,  set  =  0,  U  Qe(xi+ 1),  replace  i  by  i  +  1,  and  go 


to  Step  1. 


Else,  set  Xi+ 1  =  ?/i,  ki  =  /,  set  =  pi  +  Ap,  set  0*+i  =  OjU  Qe(xi+ 1),  replace  i  by  i  +  1, 
and  go  to  Step  1.  □ 

As  is  standard  in  stabilized  Newton  methods  (see  for  example  Section  1.4.4  of  [17]), 
Algorithm  4.2  switches  to  the  steepest  descent  direction  if  is  given  by  (47)  and  the 

largest  eigenvalue  of  Bpn(-)  is  large;  see  Step  1.  Compared  to  Algorithm  3.2  in  [13],  which  in¬ 
creases  p  when  the  smoothed  function  gradient  is  small,  Algorithm  4.2  increases  the  precision 
parameter  only  when  it  does  not  produce  sufficient  descent  in  ip(-),  as  verified  by  (65)  and 
(66).  A  small  precision  parameter  may  produce  an  ascent  direction  in  ijj(-)  due  to  the  poor 
accuracy  of  the  smoothed  function  approximation.  Thus,  insufficient  descent  is  a  signal  that 
the  precision  parameter  may  be  too  small.  All  existing  smoothing  algorithms  only  ensure 
that  V’po(')  decreases  at  each  iteration,  but  do  not  ensure  descent  in  ■?/?(• ) .  Another  change 
as  compared  to  [6,  13]  relates  to  the  line  search.  All  smoothing  algorithms  are  susceptible 
to  ill-conditioning  and  small  stepsizes.  To  counteract  this  difficulty,  Algorithm  4.2  moves 
forward  along  the  search  direction  starting  from  the  Armijo  step,  and  stops  when  the  next 
step  is  not  a  descent  step  in  'ip(-)',  see  Step  2b. 

Algorithm  4.2  has  two  rules  for  increasing  pi.  In  the  early  stages  of  the  calculations, 
i.e.,  when  Pi  <  p,  if  sufficient  descent  in  ip(-)  is  achieved  when  moving  from  Xi  to  z%  ((65) 
satisfied),  then  Algorithm  4.2  sets  the  next  iterate  ay+i  to  Zi ,  retain  the  current  value  of  the 
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precision  parameter  as  progress  is  made  towards  the  optimal  solution  of  (P).  However,  if 
(65)  fails,  then  there  is  insufficient  descent  and  the  precision  parameter  or  the  working  set 
needs  to  be  modified  to  generate  a  better  search  direction  in  the  next  iteration.  In  late  stages 
of  the  calculations,  i.e.,  pi  >  p,  Algorithm  4.2  accepts  every  new  point  generated,  even  those 
with  insufficient  descent,  and  increases  the  precision  parameter  with  a  constant  value. 

The  next  lemma  is  similar  to  Lemma  4.1. 

Lemma  4.2.  Suppose  that  {ay}°h0  C  Rd  is  a  sequence  constructed  by  Algorithm  4- 2.  Then, 
there  exists  an  i*  G  N0  and  a  set  fl*  C  Q  such  that  working  sets  0,:  =  tt*  and  i/)n*(xi)  =  'ip(xi) 
for  all  i  >  i*. 

Proof.  The  first  part  of  the  proof  follows  exactly  from  the  proof  for  Lemma  4.1.  Next,  since 
Q(xf)  C  flj  for  all  i;  see  Steps  3  and  4  of  Algorithm  4.2,  ay )  =  ifixi)  for  all  i  >  i*.  □ 

Lemma  4.3.  Suppose  that  Assumption  2.1  holds,  and  that  the  sequences  {ay}°h0  C 
and  {pi}?20  C  M  are  generated  by  Algorithm  4-2.  Then,  the  following  properties  hold:  (i)  the 
sequence  {p;}?h0  is  monotonically  increasing;  (ii)  if  the  sequence  {xi}?20  has  an  accumulation 
point,  then  Pi  — >  oo  as  i  — »  oo,  and  J!  =  +°°- 

Proof.  We  follow  the  framework  of  the  proof  for  Lemma  3.1  of  [6].  (i)  The  precision 

parameter  is  adjusted  in  Steps  3  and  4  of  Algorithm  4.2.  In  Step  3,  if  (65)  is  satisfied,  then 
Pi+ i  =  Pi]  if  (65)  fails,  p%  is  replaced  by  Clk  >  Pi-  In  Step  4,  pi+1  =  pi  +  Ap  >  p,  +  1  >  p%. 

(ii)  Suppose  that  Algorithm  4.2  generates  the  sequence  {ay}°h0  with  accumulation  point 
x*  G  Md,  but  {pi}jL0  is  bounded  from  above.  The  existence  of  an  upper  bound  on  pi  implies 
that  pi  <  p  for  all  i  G  No,  because  if  not,  Algorithm  4.2  will  enter  Step  4  the  first  time  at 
some  iteration  if  G  No,  and  re-enter  Step  4  for  all  i  >  i',  and  pi  — >  oo  as  i  — >  oo.  Thus,  the 
existence  of  an  upper  bound  on  p^  implies  that  Algorithm  4.2  must  never  enter  Step  4. 

The  existence  of  an  upper  bound  on  p^  also  implies  that  there  exist  an  iteration  i*  G  No 
such  that  (65)  is  satisfied  for  all  i  >  i*,  because  if  not,  /y  will  be  replaced  by  repeatedly, 
and  pi  — >  oo  as  i  — >  oo.  This  means  that  ip(xi+ 1)  —  if( Xi )  <  —j/pi1'  for  all  i  >  i* .  Since 
Pi  <  p  for  all  i  G  No,  'if( Xi )  — >  — oo  as  i  — >  oo.  However,  by  continuity  of  ip(-),  and  x*  being 
an  accumulation  point,  'if{xi)—^K'if{x*))  where  K  c  No  is  some  inhnite  subset.  This  is  a 
contradiction,  so  p.t  — >  oo. 

Next,  we  prove  that  Y^iLo  ^  =  +°°-  Since  pf  — >  oo,  there  exist  an  iteration  i*  G  N0  such 
that  Pi  >  p  for  all  i  >  i*.  This  means  that  the  precision  parameter  will  be  adjusted  by  the 
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rule  pi+ 1  =  Pi  +  A p  for  all  i  >  i*.  The  proof  is  complete  by  the  fact  that  1  /i  —  oo.  □ 

Lemma  4.4.  Suppose  that  Assumption  2.1  holds.  Then,  for  every  bounded  set  S  cRd  and 
parameters  a,f3  G  (0, 1),  there  exist  a  Kg  <  oo  such  that,  for  all  p  >  1,  Q  C  Q,  and  x  G  S, 

ifPn(x  +  A pn(x)hpn(x))  -  ifpn{x)  <  ?  (67) 

where  Xpn(x)  is  the  stepsize  defined  by  (61),  with  pi  replaced  by  p,  Qt  replaced  by  and  Xi 
replaced  by  x. 

Proof.  If  hpfi(x)  is  given  by  (60)  with  Bpq(x)  as  in  (46),  then  the  result  follows  by  the  same 
arguments  as  in  the  proof  for  Lemma  3.2  of  [6].  If  hp^i{x)  is  given  by  (60)  with  Bpq(x)  as 
in  (47),  then  the  result  follows  by  similar  arguments  as  in  the  proof  for  Lemma  3.4  of  [6], 
but  the  argument  deviates  to  account  for  the  fact  that  the  lower  bound  on  the  eigenvalues 
of  Bpq(x)  takes  on  the  specific  value  of  1  in  Algorithm  4.2.  □ 

Lemma  4.5.  Suppose  that  Assumption  2.1  holds  and  that  {a;i}?20  C  Rd  is  a  bounded  sequence 
generated  by  Algorithm  f.2.  Let  C  Q  and  i*  G  No  be  as  in  Lemma  f.2,  where  Qt  =  Q* 
for  all  i  >  i* .  Then,  there  exist  an  accumulation  point  x*  G  of  the  sequence  {a;j}?20  such 
that  9q*(x*)  =  0. 

Proof.  Suppose  that  is  a  bounded  sequence  generated  by  Algorithm  4.2.  Suppose 

that  there  exist  an  p  >  0  such  that 


lirninf  ||  V^PiQ.  (xj)||  >  p.  (68) 

i— xx) 

Since  {a;j}?20  is  a  bounded  sequence,  it  has  at  least  one  accumulation  point.  Hence,  by 
Lemma  4.3,  pt  — >  oo,  as  i  — >  oo.  Consider  two  cases,  xi+i  =  yt  or  xi+\  =  z%  in  Algorithm  4.2. 
If  Xi+ 1  =  yi,  by  Lemma  4.4,  there  exist  an  M  <  oo  such  that 

aM\\'V'ifpin*(xi)\\2 


V’pifi*  (■^'j+l)  'f,PiQ*{xi)  A 


Pi 


(69) 


for  i  >  i*.  Hence, 


VWi n*  (a)i+i)  -  (xt)  =  Pi+1n *  (xi+i)  -  'ifp.n*  (xi+i)  +  (®i+i)  -  ipPin*  (xt) 

oM||V^(q;0||2 

Pi 

for  %  >  i*,  where  we  have  used  the  fact  from  Proposition  2.1  that 


VWl-ifi*  (^i+l)  A  'f,PiQ*(%i+ l)j 


(71) 


20 


for  i  >  i* ,  because  pi+\  >  pt  from  Lemma  4.3. 

Next,  if  Xi+ 1  =  Zi,  then  (65)  or  (66)  is  satisfied.  It  follows  from  (9)  and  Lemma  4.2  that, 

log|n* 


^Pi+1n*{xi+i)  -  iiPin*{xi)  <  ipn*(xi+i)  + 

Pi+ 1 
iog|n*  | 


=  + 


Pi+ 1 


-  (on) 

i>(xi) 


<  I  1^* 


From  (70)  and  (72),  for  all  i  >i*, 

i>Vi+1n*{xi+i)  -  VViO* (xi)  <  max 


Piv  Pi 

~7  +  Piul  log  |n*| 

Piu 

aM\\Vi>PMxi)\\2  -7  +  pr1  log  |n* 


(72) 


(73) 

Pi  Pi "  J 

By  Proposition  2.1,  ||V^Pin*(xj)||  is  bounded  because  {oy}°h0  is  bounded.  Since  v  G  (0,1), 

there  exist  an  i**  G  N0,  where  i**  >  i*,  such  that 


aM|| V^p.p*  (ay)  || 2  >  ~7  +  p^  MogjfT 


Pi 

for  all  i  >  i**.  Therefore,  from  (73), 


Pi 


ipPi+1n*(xi+1  )-^Pi^{xi)  < 


aM\\VipPiu*{xi)f 

Pi 


(74) 


(75) 


for  all  i  >  i**.  Since  by  Lemma  4.3,  1/p*  =  +°°!  it  follows  from  (70)  and  (75)  that 

Pin*(xi )  — >  -oo,  as  i  — >  cx).  (76) 


Let  x*  be  an  accumulation  point  of  {oy}°70.  That  is,  there  exist  an  infinite  subset  K  c  N0 
such  that  Xi^Kx*.  Based  on  (9),  Lemma  4.3,  and  continuity  of  it  follows  that 

'^pin*(xi)-*Ki>n*(x*)1  as  i  >  oo,  which  contradicts  (76).  Hence,  liminf^oo  ||V'0Pin*(ay)ll  =  °- 
Consequently,  there  exists  an  inhnite  subset  K*  C  No  and  an  x*  G  such  that  xt  — >  x*  and 
OpiQ, *(xi)  — >A*  0,  as  i  — ■>  cx),  which  implies  that  limsup^^  9Pin*(xi)  >  0.  From  Definition 
2.1,  Theorem  2.1,  and  the  fact  that  (•)  is  a  nonpositive  function,  0q*(x*)  =0.  □ 

Theorem  4.2.  Suppose  that  Assumption  2.1  holds,  (i)  If  Algorithm  f.2  constructs  a  bounded 
sequence  {oy}?20  C  Rd,  then  there  exists  an  accumulation  point  x*  G  of  the  sequence 
{oy}?70  that  satisfies  0q(x*)  =  0.  (ii)  If  Algorithm  f.2  constructs  a  finite  sequence  {oy}*l0  C 
Rd,  where  i*  <  oo,  then  Step  2b  constructs  an  unbounded  infinite  sequence  {zl*i'}f,rAl  with 


(77) 


for  all  l'  E  {1,1  —  1,1  —  2, ...},  where  l  is  the  tentative  Armijo  stepsize  computed  in  Step  2a. 
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Proof.  First,  we  consider  (i).  Let  the  set  C  Q  be  as  in  Lemma  4.2,  where  flj  =  0*  for 
all  i  >  i*.  Based  on  Lemma  4.5,  there  exist  an  accumulation  point  of  the  sequence  {xi}?Z0, 
x*  E  such  that  Oq*(x*)  =  0.  The  conclusion  then  follows  by  similar  arguments  as  in 
Theorem  4.1. 

We  next  consider  (ii).  Algorithm  4.2  constructs  a  finite  sequence  only  if  it  jams  in 
Step  2b.  Then,  Substep  1  constructs  an  infinite  sequence  7^  satisfying  (77)  for  all 

l '  E  {/,  l  —  1,  l  —  2, ...}.  The  infinite  sequence  is  unbounded  since  h^a^Xi)  ^  0  as  (77)  cannot 
hold  otherwise,  and  (3  E  (0, 1).  □ 

Next,  we  consider  the  run-time  complexity  of  Algorithms  4.1  and  4.2  to  achieve  a  near- 
optimal  solution  of  (P).  Suppose  that  all  functions  p(-)  are  active,  i.e.,  0*  =  Q ,  near  an 
optimal  solution.  If  Ppo(-)  is  given  by  (46),  then  the  main  computational  work  in  each 
iteration  of  Algorithms  4.1  and  4.2  is  the  calculation  of  VV;pq(')>  which  takes  0(qd 2)  op¬ 
erations  under  Assumption  3.2;  see  the  proof  of  Theorem  3.1.  If  Ppo(-)  is  given  by  (47), 
then  the  main  computational  work  is  the  calculation  of  (47)  and  hpft(x).  Linder  Assump¬ 
tion  3.2,  it  takes  O(qd)  operations  to  compute  /i),(x),  j  E  Q ,  0(qd2)  to  compute  V/J(x), 
j  E  Q,  0(d2)  to  multiply  V f3 (x)V /J (x)T ,  0(qd2)  to  sum  ^jenl^3p(x)\/P(x)'Vp(x)T\  0(qd) 
to  sum  'ZjeQPp(x)Vfj(x),  and  0(d 2)  to  multiply  E 

The  minimum  eigenvalue  computation  of  Hpq(x)  for  Algorithm  4.2  takes  0(d2)  operations 
(see  [21]).  In  all,  the  number  of  operations  to  obtain  Bpn(x)  is  0(qd2).  A  direct  method  for 
solving  a  linear  system  of  equations  to  compute  hpn(x)  results  in  0(d3)  operations;  see  for 
example  page  63  of  [22],  Hence,  if  Ppo(-)  is  given  by  (47),  then  the  computational  work  in 
each  iteration  of  Algorithms  4.1  and  4.2  is  0(qd2  +  d3). 

It  is  unclear  how  many  iterations  Algorithms  4.1  and  4.2  would  need  to  achieve  a  near- 
optimal  solution  as  a  function  of  q.  However,  since  they  may  utilize  Quasi-Newton  search 
directions  and  adaptive  precision  adjustment,  there  is  reason  to  believe  that  the  number 
of  iterations  will  be  no  larger  than  that  of  Algorithm  3.1,  which  uses  the  steepest  descent 
direction  and  a  fixed  precision  parameter.  Thus,  suppose  that  for  some  tolerance  t  >  0, 
the  number  of  iterations  of  Algorithms  4.1  and  4.2  to  generate  {x?;}”=0,  with  the  last  iterate 
satisfying  ^(^n)  <t,  is  no  larger  than  O(logg),  as  is  the  case  for  Algorithm  3.1.  Then, 

focusing  on  q,  we  find  that  under  these  assumptions,  the  run-time  complexity  of  Algorithms 
4.1  and  4.2  to  generate  a  near-optimal  solution  is  no  larger  than  O(qlogq). 
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5  Numerical  Results 


We  present  an  empirical  comparison  of  Algorithms  4.1  and  4.2  with  algorithms  from  the 
literature  over  a  set  of  problem  instances  from  [6,  7]  as  well  as  randomly  generated  instances; 
see  Appendix  A  and  Table  1.  This  study  appears  to  be  the  first  systematic  comparison  of 
smoothing  and  SQP  algorithms  for  large-scale  problems.  We  examine  problem  instances 
with  number  of  functions  up  to  three  orders  of  magnitude  larger  than  previously  reported. 

Specifically,  we  examine  (i)  Algorithm  2.1  of  [7],  an  SQP  algorithm  with  two  QPs  that 
we  refer  to  as  SQP-2QP,  (ii)  Algorithm  A  in  [9],  a  one-QP  SQP  algorithm  that  we  refer  to 
as  SQP-1QP,  (iii)  Algorithm  3.2  in  [13],  a  smoothing  Quasi-Newton  algorithm  referred  to 
as  SMQN,  (iv)  Pshcnichnyi-Pironneau-Polak  min-max  algorithm  (Algorithm  2.4.1  in  [17]), 
referred  to  as  PPP,  (v)  an  active-set  version  of  PPP  as  stated  in  Algorithm  2.4.34  in  [17]; 
see  also  [23],  which  we  refer  to  as  e-PPP,  and  (vi)  Algorithms  4.1  and  4.2  of  the  present 
paper.  We  refer  to  Appendix  B  for  details  about  algorithm  parameters.  With  the  exception 
of  PPP  and  SQP-1QP,  the  above  algorithms  incorporate  active-set  strategies  and,  hence, 
appear  especially  promising  for  solving  large-scale  problems.  We  implement  and  run  all 
algorithms  in  MATLAB  version  7.7.0  (R2008b)  (see  [24])  on  a  3.73  GHz  PC  using  Windows 
XP  SP3,  with  3  GB  of  RAM.  All  QPs  are  solved  using  TOMLAB  CPLEX  version  7.0  (R7.0.0) 
(see  [25])  with  the  Primal  Simplex  option,  which  preliminary  studies  indicate  result  in  the 
smallest  QP  run  time.  We  also  examined  the  LSSOL  QP  solver  (see  [26]),  but  its  run  times 
appear  inferior  to  that  of  CPLEX  for  large-scale  QPs  arising  in  the  present  context. 

Algorithm  2.1  of  [7]  is  implemented  in  the  solver  CFSQP  [27]  and  we  have  verified  that 
our  MATLAB  implementation  of  that  algorithm  produces  comparable  results  in  terms  of 
number  of  iterations  and  run  time  as  CFSQP.  We  do  not  directly  compare  with  CFSQP 
as  we  find  it  more  valuable  to  compare  different  algorithms  using  the  same  implementation 
environment  (MATLAB)  and  the  same  QP  solver  (CPLEX). 

We  carry  out  a  comprehensive  study  to  identify  an  e  (see  (45))  in  the  algorithms’  active- 
set  strategies  that  minimizes  the  run  time  for  the  various  algorithms  over  a  wide  range  of 
e  (1,000  to  1  •  10-20).  We  find  that  SQP-2QP  is  insensitive  to  the  selection  of  e,  primarily 
because  the  algorithm  includes  additional  steps  to  aggressively  trim  the  working  set.  e- 
PPP  is  highly  sensitive  to  e  with  variability  within  a  factor  of  200  in  run  times.  SMQN, 
Algorithm  4.1,  and  Algorithm  4.2  accumulate  functions  in  the  working  set  and  therefore  are 
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also  sensitive  to  e.  The  run  times  of  SMQN,  Algorithm  4.1,  and  Algorithm  4.2  tend  to  vary 
within  a  factor  of  ten.  The  below  results  are  obtained  using  the  apparent,  best  choice  of  e 
for  each  algorithm. 

For  Algorithm  4.2,  we  mainly  use  the  Quasi-Newton  direction  with  Bpq(x)  as  defined 
in  (47),  because  preliminary  test  runs  show  that  generally,  the  alternate  steepest  descent 
direction  with  Bpq(x)  as  defined  in  (46)  produces  slower  run  times. 

We  examine  all  problem  instances  from  [6,  7]  except  two  that  cannot  be  easily  extended 
to  large  q.  As  the  problem  instances  with  large  dimensionality  in  [6,  7]  do  not  allow  us 
to  adjust  the  number  of  functions,  we  create  two  additional  sets  of  problem  instances.  All 
problem  instances  are  described  in  detail  in  Appendix  A. 

We  report  run  times  to  achieve  a  solution  x  that  satisfies 

^(x)  -  #arget  <  t,  (78) 

where  ?/Jarget  is  a  target  value  (see  Appendix  A)  equal  to  the  optimal  value  (if  known)  or 
a  slightly  adjusted  value  from  the  optimal  values  reported  in  [6,  7]  for  smaller  q.  We  use 
t  =  10“5.  Although  this  termination  criteria  is  not  possible  for  real- world  problems,  we  find 
that  it  is  the  most  useful  criterion  in  this  study. 

Table  2  summarizes  the  run  times  (in  seconds)  of  the  various  algorithms,  with  columns 
2  and  3  giving  the  number  of  variables  d  and  functions  q,  respectively.  Run  times  in  boldface 
indicate  that  the  particular  algorithm  has  the  shortest  run  time  for  the  specific  problem 
instance.  The  numerical  results  in  Table  2  indicate  that  in  most  problem  instances,  the 
run  times  are  shortest  for  SQP-2QP  or  Algorithm  4.2.  Table  2  indicates  that  SQP-2QP 
is  significantly  more  efficient  than  SQP-1QP  for  problem  instances  ProbA-ProbG.  This  is 
due  to  the  efficiency  of  the  active-set  strategy  in  SQP-2QP,  which  is  absent  in  SQP-1QP. 
However,  for  ProbJ-ProbM,  SQP-1QP  is  comparable  to  SQP-2QP.  This  is  because  at  the 
optimal  solution  of  ProbJ-ProbM,  all  the  functions  are  active.  This  causes  the  active-set 
strategy  in  SQP-2QP  to  lose  its  effectiveness  as  the  optimal  solution  is  approached. 

Table  2  indicates  also  that  Algorithm  4.1  is  significantly  more  efficient  than  SMQN  for 
most  problem  instances.  As  the  only  difference  between  the  two  algorithms  lie  in  their 
precision-parameter  adjustment  scheme,  this  highlights  the  sensitivity  in  the  performance  of 
smoothing  algorithms  to  the  control  of  their  precision  parameters.  Table  2  also  shows  that 
Algorithm  4.2  is  more  efficient  than  Algorithm  4.1  and  SMQN  for  most  problem  instances. 
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Table  2  indicates  that  SQP-2QP  is  generally  more  efficient  than  Algorithm  4.2  for  prob¬ 
lem  instances  with  small  dimensionality,  d  <  4  (specifically  ProbA-ProbG),  and  vice  versa. 
This  is  consistent  with  the  common  observation  that  SQP-type  algorithms  may  be  inefficient 
for  instances  of  large  dimensionality;  see  for  example  [7] . 

Table  2  shows  that  some  algorithms  return  locally  optimal  solutions  for  some  problem 
instances  (labeled  “local”  in  Table  2).  In  view  of  these  results,  there  is  an  indication  that 
smoothing  algorithms  (SMQN,  Algorithms  4.1  and  4.2)  tend  to  find  global  minima  more 
frequently  than  PPP  and  SQP  algorithms. 

Table  3  presents  similar  results  as  in  Table  2,  but  for  larger  q.  We  do  not  present  results 
for  PPP  and  SQP-1QP  as  the  required  QPs  exceed  the  memory  limit.  The  comprehensive 
sensitivity  studies  for  e  show  significant  improvement  for  Algorithm  4.2  for  ProbJ-ProbM  if 
a  large  e  is  used.  Hence,  we  include  the  results  for  Algorithm  4.2  with  e  =  1000  in  Table 
3.  Note  that  such  a  large  e  means  that  there  is  effectively  no  active-set  strategy.  Sensitivity 
tests  conducted  for  the  other  algorithms  with  a  larger  e  show  no  improvement  in  their  run 
times. 

The  observations  from  Table  3  are  similar  to  those  for  Table  2.  Table  3  indicates  that 
Algorithm  4.2  with  e  =  1000  is  efficient  for  ProbJ-ProbM,  which  are  large  dimensionality 
problem  instances  with  a  significant  number  of  functions  active  at  the  optimal  solution.  For 
completeness,  the  run  times  for  Algorithm  4.2  with  e  =  1000  for  ProbJ-ProbM  in  Table  2 
are  2.8,  14.3,  0.36  and  3.0  seconds  respectively,  while  the  run  times  for  the  other  problem 
instances  are  slower  than  Algorithm  4.2  with  e  =  10~20. 

The  results  in  Tables  2  and  3  indicate  that  among  the  algorithms  considered,  SQP- 
2QP  and  Algorithm  4.2  are  the  most  efficient  algorithms  for  minimax  problems  with  a 
large  number  of  functions.  The  run  times  for  ProbJ-ProbM  indicate  that  SQP-2QP  is  less 
efficient  for  problem  instances  with  a  significant  number  of  the  functions  that  is  e-active  at 
the  solution,  as  the  active-set  strategy  loses  its  effectiveness. 

The  problem  instances  from  the  literature  examined  in  Tables  2  and  3  include  either 
cases  with  few  functions  e-active  at  an  optimal  solution  (ProbA-ProbI)  or  cases  with  all 
functions  e-active  (ProbJ-ProbM).  We  also  examine  randomly-generated  problem  instances 
with  an  intermediate  number  of  functions  e-active  at  the  optimal  solution;  see  ProbN  in 
Table  1.  The  optimal  values  are  unknown  in  this  case  but  the  target  values  as  given  in  Table 
1  appear  to  be  close  to  the  global  minima. 
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Table  4  presents  the  run  times  for  Algorithm  4.2  and  SQP-2QP  on  ProbN.  As  the 
problem  instances  are  relatively  well-conditioned,  Algorithm  4.2  with  Bpq ( • )  given  by  (46), 
i.e. ,  a  steepest  descent  (SD)  direction,  may  perform  well  and  is  included  in  the  table.  The 
parameter  e  for  Algorithm  4.2  is  set  to  1000  for  this  set  of  problem  instances,  as  preliminary 
test  runs  show  that  it  is  consistently  better  than  other  choices.  Table  4  indicates  that  SQP- 
2QP  is  less  efficient  than  Algorithm  4.2  for  problem  instances  with  large  dimensionality,  and 
where  there  is  a  significant  number  of  functions  e-active  at  the  optimal  solution.  The  last 
row  in  Table  4  shows  that  for  problem  instances  with  high  dimensionality  (d  >  10,000),  the 
storage  of  the  d  x  d  Hpq ( • )  matrix  for  both  SQP-2QP  and  Algorithm  4.2,  with  Bpq ( • )  given 
by  (47),  causes  both  algorithms  to  terminate  due  to  memory  limitations.  Thus,  Algorithm 
4.2,  with  Bpfi(-)  given  by  (46),  which  do  not  have  any  matrix  to  store,  may  be  a  reasonable 
alternative  for  problem  instances  with  large  dimensionality. 

6  Conclusions 

This  paper  focused  on  finite  minimax  problems  with  many  functions,  which  may  result  from 
finely  discretized  semi-infinite  minimax  or  optimal  control  problems.  We  conduct  run-time 
complexity  and  rate  of  convergence  analysis  of  smoothing  algorithms  for  solving  such  prob¬ 
lems  and  compare  them  with  those  of  SQP  algorithms.  We  find  that  smoothing  algorithms 
may  only  have  the  sublinear  rate  of  convergence  1/n,  where  n  is  the  number  of  iterations. 
However,  as  shown  by  the  complexity  results,  their  slow  rate  of  convergence  may  be  com¬ 
pensated  by  small  computational  work  per  iteration,  which  is  of  order  0(q),  where  q  is  the 
number  of  functions.  We  present  two  smoothing  algorithms  using  exponential  penalty  func¬ 
tions  with  active-set  strategies.  The  first  algorithm  is  based  on  a  recent  smoothing  algorithm, 
but  uses  a  much  simpler  rule  for  precision  adjustment.  The  second  algorithm  implements  a 
novel  line  search  rule  that  aims  to  ensure  descent  in  the  original  objective  function,  as  op¬ 
posed  to  descent  in  the  smoothed  objective  function  that  existing  smoothing  algorithms  use. 
We  provide  a  comprehensive  numerical  comparison  between  smoothing  and  SQP  algorithms 
and  find  that  the  proposed  algorithms  are  competitive,  and  especially  efficient  for  large-scale 
minimax  problems  with  a  significant  number  of  functions  e-active  at  stationary  points. 
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Appendix  A.  Problem  Instances 

Table  1  describes  the  problem  instances  used.  Most  columns  are  self-explanatory.  Columns  2 

and  3  give  the  number  of  variables  d  and  functions  q,  respectively.  The  target  values  (column 

7)  are  equal  to  the  optimal  values  (if  known)  or  a  slightly  adjusted  value  from  the  optimal 
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values  reported  in  [6,  7]  for  smaller  q.  The  same  target  values  are  used  for  ProbA-ProbM  in 
Tables  2  and  3. 

In  this  appendix,  we  denote  components  of  x  G  by  subscripts,  i.e.,  x  =  (aq,  x2, ...,  Xd)  € 
Md.  When  the  problem  is  given  in  semi-infinite  form,  as  in  (80a)  -  (80i)  below,  the  set  Y  is 
discretized  into  q  equally  spaced  points  if 


=  max0(x,  y), 


and  qj 2  equally  spaced  points  if 


i^(x) 


(79b) 


ProbA  is  defined  by  (79a)  and  (80a)  below,  while  ProbB-ProbI  are  defined  by  (79b)  and 
(80b)- (80i)  below,  respectively. 


<t>{x,y)  =  (2y2  -  l)x  +  y(l  -y)(l  -  x),  Y  =  [0, 1]  (80a) 

(f>{x,y)  =  (1  -y2)  -  (0.5x2  -2yx),  T'=[-l,l],  (80b) 

4>{x,y)  =  y2  -  (yx!  +  x2exp(y)),  Y  =  [ 0,2],  (80c) 

4>{x,y)  =  — 7 - xlexp(yx2),  Y  =  [-0.5,  0.5],  (80d) 

(f)(x,y)  =  sin  y- (y2x3  +  yx2  +  x1),  T'  =  [0, 1],  (80e) 

<t>{x,y)  =  exp (y)  -  Xl+  yX\  Y  =  [0, 1],  (80f) 

1  +  yx3 


y)  =  y/y-[x 4  -  ( y2x1  +  yx2  +  x3)2] ,  Y  =  [0.25, 1] ,  (80g) 

4>(x,y)  =  Y^y  ~  [^i  exp(yx3)  +  x2exp(yx4)\,  Y  =  [-0.5, 0.5],  (80h) 

(p(x,y)  =  YY~y  ~  [a^iexp (yx4)  +  x2exp (yx5)  +  x3  exp(yx6)],  Y  =  [-0.5,  0.5],  (80i) 

ProbJ-ProbM  are  defined  by  ip{x)  =  maxjGQ  fJ(x),  with  f^(x)  as  given  in  (80j)-(80m)  below, 
respectively. 

f3(x)  =  x2,  j  =  {1, ...,  q},  (80j) 

fJ(x)  =  X(j_1)2+1  +  x\ j,  j  =  {1, ...,  q},  (80k) 

P(x)  =  X2u_1)4+1  +  X2{j_1)4+2  +  ^-1)4+3  +  Xlj,  J  =  I1,  -t9}.  (801) 

fJ(x)  =  x\.  +  x\,  j  =  |l,  2,  3, ...,  Q  | ,  (80m) 


30 


where  ( kj,lj )  are  all  the  different  2-combinations  (see  Section  3.3  of  [28])  of  {1,  2,  3, d}, 
and 


f(x)  =  djxl  +  bjXi  +  Cj,j  =  {1, q}, 


(80n) 


where  i  = 


q/d 

bution  on  [0.5, 1]. 


,  and  the  constants  a , ,  6, ,  c,  are  randomly  generated  from  a  uniform  distri- 


Appendix  B.  Algorithm  Details  and  Parameters 

This  appendix  provides  details  on  the  algorithms  implemented. 

PPP.  Pshenichnyi-Pironneau-Polak  min-max  algorithm  (Algorithm  2.4.1  in  [17])  with 
a  =  0.5,  (3  =  0.8,  and  <5=1.  We  use  the  same  Armijo  stepsize  rule  parameters  a  and  (3  for 
all  algorithms. 

e-PPP.  e-Active  PPP  algorithm  (Algorithm  2.4.34  in  [17]  and  the  proof  of  convergence 
in  [23])  with  the  same  parameters  as  above.  The  algorithm  implemented  is  the  more  recent 
version  in  [23],  which  implements  the  primal  form  of  the  optimality  function.  Preliminary 
experiments  show  that  the  primal  form  is  more  efficient  for  large-scale  problems  with  a  large 
number  of  functions  than  the  equivalent  dual  form  on  page  176  of  [17]. 

SQP-2QP.  Sequential  Quadratic  Programming  with  two  QPs  in  each  iteration;  see  Al¬ 
gorithm  2.1  of  [7].  We  use  the  algorithm  parameters  recommended  in  [7]  as  well  as  monotone 
line  search.  (We  examined  the  use  of  nonmonotone  line  search  in  CFSQP,  but  find  it  inferior 
to  monotone  line  search  on  the  set  of  problem  instances  and  therefore  implemented  the  latter 
approach.) 

SQP-1QP.  Sequential  Quadratic  Programming  with  one  QP  in  each  iteration;  see  Algo¬ 
rithm  A  in  [9].  As  there  are  no  proposed  parameter  settings  in  [9],  the  algorithm  parameters 
used  are  the  mid-point  values  stated  in  Algorithm  A,  a  =  0.25  (a  in  this  algorithm  is  not 
the  Armijo  parameter),  r  =  2.5,  and  matrix  H0  =  identity  matrix.  The  same  parameter 
settings  for  a  and  H0  are  used  by  a  co-author  in  a  similar  algorithm  to  solve  the  minimax 
problem;  see  [29]. 

SMQN.  Smoothing  Quasi-Newton  algorithm;  see  Algorithm  3.2  in  [13].  There  are 
no  proposed  parameter  settings  in  [13].  We  adopt  commonly-used  parameters  from  other 
smoothing  algorithms,  p0  =  1  and  B(-)  =  Identity  matrix.  For  the  Penalty-Parameter 
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Adjustment  subroutine,  which  is  the  same  as  that  in  [6],  we  use  Case  (A)  of  [6],  which  is 
shown  to  be  comparable  to  Case  (B). 

Algorithm  4.1.  The  algorithm  parameters  used  are  the  same  as  for  SMQN,  except 
for  the  parameters  in  the  different  Adaptive  Penalty  Parameter  Adjustment  subroutine, 
e  =  2,<t  =  2. 

Algorithm  4.2.  The  algorithm  parameters  used  are  t  =  10-5,po  =  l,p  =  (log q/t)  ■ 
1010,  k  =  1030,  a  =  0.5,  P  =  0.8,  f  =  2, 7  =  t  ■  10~10,  v  =  0.5,  A p  =  10. 
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Table  1:  Problem  instances.  An  asterisk  *  indicates  that  the  problem  instance  is  created  by  the  authors. 
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Table  2:  Run  times  (in  seconds)  for  various  algorithms.  The  word  “local”  means  that  the  algorithm  converges  to  a  locally  optimal  solution 
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Table  3:  Similar  results  as  in  Table  2,  but  with  larger  q.  The  word  “local”  means  that  the  algorithm  converges  to  a  locally  optimal  solution 
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Table  4:  Run  times  (in  seconds)  of  algorithms  on  problem  instance  ProbN.  “SD”  and  “QN”  indicate 
that  Algorithm  4.2  uses  Bpq(-)  given  by  (46)  and  (47),  respectively.  The  word  “mem”  indicates 
that  the  algorithm  terminates  due  to  insufficient  memory. 
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